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Abstract 

We examine the scaling regime for the detrended fluctuation analysis (DFA) - 
the most popular method used to detect the presence of long memory in data and 
the fractal structure of time series. First, the scaling range for DFA is studied 
for uncorrelated data as a function of length L of time series and regression line 
coefficient B?' at various confidence levels. Next, an analysis of artificial short series 
with long memory is performed. In both cases the scaling range A is found to 
change linearly - both with L and . We show how this dependence can be 
generalized to a simple unified model describing the relation A = A(L, E?, H) where 
H {1/2 < H <1) stands for the Hurst exponent of long range autocorrelated data. 
Our findings should be useful in all applications of DFA technique, particularly for 
instantaneous (local) DFA where enormous number of short time series has to be 
examined at once, without possibility for preliminary check of the scaling range of 
each series separately. 
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1 Introduction and description of the method. 



Detrended fluctuation analysis (DFA) [Hill [3] is now considered the main tool in searching 
for fractal [H O [6], multifractal [TJ [8] and long memory effects in ordered data. There 
is more than one thousand articles published on DFA and its applications so far. The 
detrended technique has been widely applied to various topics, just to mention: genetics 
(see e.g. El EHl E], meteorology (see e.g. [T^l IISI E]), cardiac dynamics (see e.g. 
dSl [IS]), astrophysics (see e.g. [IT]), finances (see e.g. [ISl 121 EOl ED 1221 ESI [21]) and 
many others. The indisputable advantage of DFA over other available methods searching 
for the Hurst exponent H [251 [26] in series of data, like the rescaled range method (R/S) 
[3 [23 I2ni [2Z] , is that DFA is shown to be resistant to some extent to non-stationarities 
in time series [2E]- 

We will not describe the DFA technique in details here, for it is done in many other 
publications (see e.g. [23 [301 ED [32] )• Instead, we will focus mainly on the issues which 
are relevant for the so called scaling range being the goal of this article. 

Briefly, the DFA method contains the following steps: (i) the time series x{t) [t = 
1,2, ...,L) of data (random walk) is divided into non-overlapping boxes (time windows) 
of length T each, (ii) the linear trencl^ is found within each box and then subtracted from 
the signal giving so called detrended signal x{t), (iii) the mean-square fluctuation F^{t) 
of the detrended signal is calculated in each box and then F'^{t) is averaged over all boxes 
of size r, (iv) the procedure is repeated for all box sizes r (1 < r < L). 

One expects that the power law 

(F^(r)),o.~r2^ (1) 

is fulfilled for stationary signal^ where {.)box is the expectation value - here, the average 
taken over all boxes of size r. The latter equation allows to make the linear fit in log- 
log scale to extract the value of H exponent necessary in various applications. One can 
also look alternatively at the above relationship as a link between the variance of the 
detrended random walk x{t) and its duration time t, i.e. {x'^{t)) ~ what reflects 
the precise definition of Hurst exponent in stochastic processes. The H exponent clearly 
indicates the randomness nature of this process. One deals with uncorrelated steps in data 
series ii H = 1/2, once for other values of H these steps are respectively anticorrelated 
{0 < H < 1/2) or autocorrelated with (positive) long memory (1/2 < H < 1). 

The edge part of time series is usually not covered by any box. Some authors suggest 
to overcome this difficulty performing DFA in two opposite directions in time series, i.e. 
according to increasing and then according to decreasing time arrow (see e.g. [33]). The 
average of mean-square fluctuations from such divisions is then taken for evaluation of 
time series properties. 

We proposed another solution in Refs. [5H [55] . If the remaining part of time series 
AL has the length r/2 < AL < r, we cover it by an additional box of size r partly 

^the subtracted trend can also be mimicked by nonlinear polynomial function of order k in so called 
DFA-fc schemes - we will not discus this issue in details here 

^this property holds also for non-stationary, positively autocorrelated {H > 1/2) time series [28] 
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overlapping the preceding data. If AL < r/2, we do not take into account the part of 
data contained in AL. Such recipe is particularly useful in the 'local' version of DFA 
[ISl EH ESI ESI EH ESI ESj, where the time arrow is important. Throughout this article 
we will apply the latter approach. 

If time series are infinitely long, the formula in Eq.(l) holds for all r's. However, in 
practise we always deal with finite, and sometimes with rather short time series. Partic- 
ularly, it is a case for the mentioned already instantaneous or local DFA analysis, where 
one wants to find a dynamics of fractal properties changing in time and (or) their time 
dependent long memory in data. Covering the data series with boxes, we are finally 
stuck with situation that for small number of boxes covering the time series (for large 
r ), the scaling is not revealed in Eq.(l) due to small statistics we deal with. In other 
words, we are allowed in this case to take r only within some range Tmin < t < Tmax 
called the scaling range. One expects within this range "sufficiently good" performance 
of the power law, thus leading to H exponent extraction via linear fit. But what does 
this "sufficiently good" performance exactly mean? In most research activities authors 
end up with Tmax ~ 1/4-L, where L is the total length of considered data. Is it still good 
or already too large scaling range? This problem is somehow circumvented in papers but 
it does have impact on the final results. The aim of this and other forthcoming article 
|40j is to confront this issue. Our approach will be different than the one published in 
[m HJl US]. The goal is to find qualitative and quantitative dependence between the 
scaling range A = Tmax and main parameters of time series like its length, level of long 
memory described by the Hurst exponent H, and the goodness of linear fit induced by 
the form of Eq.(l) in log-log scale. The latter one is usually measured by the regres- 
sion line coefficient. All this can be done at desired confidence level {CL) indicating the 
minimal ratio of time series fulfilling the functional dependence A = A(L, i?^, H). We are 
going to find this relation below. 

Throughout this paper we assumed that Tmin = 8 because below this threshold a 
significant lack of scaling in DFA is observed due to emergence of artificial autocorrelations 
associated with too short bursts of data in r boxes. We start with analysis of uncorrelated 
data in the next section and proceed with long memory correlated time series in section 3. 
Section 4 tries to obtain an unified formula for scaling range vs L and for all > 1/2. 
Although the presented considerations are done exclusively for DFA method, they can 
be easy extended to other detrended methods introduced in literature, in particular to 
those based on moving averages [SI SSI SSI SZ]- The latter analysis is left to another 
publication [40]. 

2 DFA scaling ranges for uncorrelated data 

The starting point for the entire search is the statistical analysis of an ensemble of artifi- 
cially generated time series with a given length. For this ensemble we find the percentage 
rate of series which are below the specified level of regression line fit parameter R^. This 
rate will obviously depend on the maximum size of the box Tmax- The larger Tmax, the 
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percentage rate of series not matching the assumed criterion for will be also larger. 
Fig.l. illustrates this fact for two specified series lengths L = 10^ and L = 3 x 10^ of un- 
correlated data increments {H = 1/2) drawn from the normalized Gaussian distribution. 
The rejection rate, i.e the percentage rate of series not matching the assumed criterion 
for is shown there for different values. 

We took two particular values of rejection rate in further analysis: 2.5% and 5.0%, 
connected with confidence levels CL = 97.5% and CL = 95.0% respectively. All data 
have been gathered numerically on a set of 5 x 10^ artificially generated time series of 
length between 5 x 10^ < L < 2 x 10^ for the above-stated confidence levels. The r^ax 
value corresponding to required CL and for given R^ is identified with the scaling range 
and referred to exactly as A. 

Introducing for convenience a new parameter u = 1 — R"^, we may search for a A(L) 
dependence for L < 2 x 10^, for different values of u and for selected CL's. The results 
are presented in a series of graphs in Figs. 2, 3 and reveal a very good linear relationship 
between the scaling range profile and the length of uncorrelated datal§: 

X{u,L) = A{u)L + B{u) (2) 

The functional dependence of coefficients A{u) and B{u) on u has to be further specified 
from the regression line fit of the above equation. The latter procedure yields to the values 
of A and B estimated for the spread of u parameters and gathered in Fig. 4. 

We see from these graphs that the dependence of A{u) is again linear for both cases 
of CL = 97.5% and CL = 95%, while the value of B varies very weakly with u, what 
legitimates us to accept B{u) = b = const. 

Ultimately, the foregoing considerations lead to the following simple formula describing 
the full scaling range dependence on L and u: 

X{u, L) = {au + aQ)L + b (3) 

with some unknown constants and b to be fitted. 

We made the fit for Eq. (3) requiring minimization of mean absolute error (MAE) and 
simultaneously, minimization of the maximal relative error (ME) for each of the fitting 
point^(Lj, Mj). The MAE denoted as AmaeW is understood as 

AmaeW = l/N^ij) \i>^7iL,u) - X,jiL,u))/X,,iL,u)\ (4) 

where Xij{L,u) = X{Li,Uj) is taken from Eq.(3) for the particular choice L = Li and 
u = Uj, while Xlj^{L,u) is the respective value simulated numerically for given ensemble 
of time series, and A^(ij) counts different (ij) pairs. 



•^obviously A(L, u) e Z, so in fact only the integer part of RHS of Eq.(2) should be taken for determi 
nation of A(L, u) 



we considered Uj = 5 x 10-3(1 + j) where j = 1, 2, 9 and L, covering the range from L — 5 x 10 



up to L = 2 X 10 as indicated on plots 
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Similarly ME marked below as ^me is simply defined as 

Ame = max(|A^'''^(L,M) - \ij{L,u))\/\ij{L,u)) (5) 

Note that some pairs {Li,Uj) are not permitted by the specific CL demanclf]. It is seen 
already in Fig.l. These points are therefore absent in Figs 2-3, 5-6. 

The fitting procedure led to the values of parameters in equation Eq.(3) gathered in 
Table 1. The exemplary results of scaling ranges for the wide spread of L and u values 
are presented graphically in Figs. 5, 6. Whenever A(L, u) comes out negative in the found 
fitting patterns for the particular length of the series, one should interpret this as a lack 
of scaling range at the given confidence level CL for the required value of regression line 
coefficient i?^ within DFA. 



3 DFA scaling range for long-memory correlated data 

The analysis presented in the previous section can be extended to time series manifest- 
ing long memory. The series with 0.5 < H < 0.9 are of particular interest since they 
correspond to long-range autocorrelated data one often meets in practice in various areas. 

To construct such signals we used Fourier filtering method (FFM) [18] . The level of 
autocorrelations in this approach was directly modulated by the choice of autocorrelation 
function C{6t) which satisfies for stationary series with long memory the known power 
law Hn]: 

Cis) = {Ax{t + s)Axit)) ~ H{2H - l)^'^-^ 

where Ax{t) = x(t + 1) — x(t), {t = 1, 2, L — 1) are increments of discrete time series, 
s is the time-lag between observations, H is the Hurst exponent [251 126], and the average 
is taken over all data in series. 

We start with similar analysis as the one shown in Fig.l for uncorrelated data. Fig. 7 
presents an example of plot made for the ensemble of 5 x 10^ autocorrelated signals of 
length L = 10^ with H = 0.7. The percentage rate of rejected time series not satisfying 
the assumed goodness of DFA fit is shown there for several distinct as a function of 
maximal box size Tmax- The outcome of such analysis for a range of simulated data lengths 
and for various Hurst exponents can be collected in number of plots as in Figs. 8a, 9a for 
A(L), and in Fig.8b, 9b for A(-u) dependence. To make the figure readable and due to lack 
of space, only plots for u = 0.02 and L = 10^ are shown. The relations for other values 
look qualitatively the same. We should not be surprised, taking into account the results 
of the previous chapter, that these relationships are again linear. Thus the formula in 
Eq.(2) is more general and coefficients A{u) and B{u) are linear function of u also for 
series with memory. The latter relationships are drawn in details for H = 0.6, 0.7, 0.8 

^we found that following pairs: (L < 3000, ui), {L < 1800, ua), {L < 1500, wg), {L < 1000, W4), 
(L < 1000, U5), {L < 800, ue), (L < 800, uj), {L < 600, ug) do not match the CL = 97.5% requirement, 
and: {L < 2400,Mi),(i < 1800,^2), (i < 1200,U3),(i < 1000,U4),(i < 800,U5),(L < 800,U6),(L < 
600, ity) do not match the CL — 95% demand 
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Table 1: Results of the best fit for coefficients in Eq.(3) found for series with various 
autocorrelation level measured by H exponent and for chosen two confidence levels: 97.5% 
and 95%. The accuracy of fitted parameters are respectivel: Aa = ±10^'^, Aao = ±10^^, 
A6 = 0. 



in Fig. 10. In particular, we notice from Fig. 10b the similar behavior of B{u) coefficient 
for auto correlated data as it has been observed in the previous section for uncorrelated 
signals, i.e. B{u) remains almost constant as a function of u. Moreover, its dependence on 
H is also negligible. Thus, the formula postulated in Eq.(3) applies also for autocorrelated 
data with a, ao and b coefficients to be fitted independently for each H. 

We did such a fit for series with long memory, assuming the same criterions for MAE 
and ME as previously. The results are collected in Table 1 for two different confidence 
levels and are shown graphically in Figs. 11-16. These figures generalize plots shown 
for H = 0.5 in Figs. 5,6. The extremely good linear relationship of Eq.(3) is kept for 
autocorrelated signals up to L = 10^. Only for highly autocorrelated series {H > 0.8) or 
very long ones {L > 10^) we noticed some slight departure from the linear dependenc^. 

4 Towards unified model of scaling ranges 

Finally, we should investigate if there exists a unified formula with the minimal number of 
free parameters, able to describe all scaling ranges of both uncorrelated and autocorrelated 
data. So far we know that Eq.(3) with parameters fitted according to Table 1 describes 
very well A(L, u) dependence for given H. We should discuss then the form of relationships 
a{H), ao(-ff), and b{H) in the relation 

X{u,L,H) = {a{H)u + ao{H))L + b{H) (7) 

Looking at the bottom panels of Figs. 4, 10 one perceives immediately that the assump- 
tion b{H) = const can be justified. Similarly, we may easily notice from data collected 
in Table 1 that ao{H)/{a{H)u) < 0(10^^). It means that the component a{H)u gives 
the leading contribution to the linear factor a{H)u + aQ{H) in Eq. (7) for each value of 
H and therefore, one should focus mainly on a{H) dependence depicted in Fig. 17. The 
latter relationship also appears to be linear, which allows to represent Eq.(7) in its sim- 
plest unified form containing the smallest number of four free parameters (a,/3,ao,7) as 
follows: 

^the predicted scaling ranges from Eq.(3) were nevertheless lower in these cases than the ones coming 
from the direct simulation 
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Table 2: Results of the best fit for coefficients in unified formula in Eq.(8). Fit was done for 
all data coming from investigated series, separately for two chosen confidence levels: 97.5% 
and 95%. The accuracy of fitted parameters have been estimated: Act = A/3 — 10~^, 
Sao = 10~^ A7 = 0. 



X{u,L,H)^{{aH + P)u + ao)L + j (8) 

Demanding minimization of MAE and ME during fitting procedure of the proposal given 
in Eq.(8) to all data points X'^^^{Li,Uj, H) indicated in previous sections, we arrive with 
the best fit results for these free parameters as shown in Table 2. The obtained unified 
formula can be particularly useful while doing interpolation to arbitrary autocorrelation 
levels 1/2<H < 1. 

In fact the fit based on Eq.(8) is of the same quality as the one produced by Eq.(3) 
(sec Table 1 and 2 to compare MAE and ME errors). The difference between two fitting 
methods is so negligible that it cannot be noticed graphically. Therefore the fitting lines 
shown in series of Figs. 11- 16 describe equally well the unified model based on data from 
Table 2 and the 'local' fit based on data from Table 1. We may also easy conclude from 
Eq.(8) that the average relative change in the scahng range SX{6H)/X{H) due to the small 
change SH in Hurst exponent is given as 

and varies from 3% (at R"^ = 0.99) to 10% (at R'^ = 0.97) for any change SH ^0.1 in the 
investigated signal. 



5 Discussion and Conclusions 

In this study we searched for the scaling range properties of the most substantial power law 
between fiuctuations of detrended random walk -F^(r) and the length of the time window r 
in which such fluctuations are measured. This power law proposed within DFA technique 
gives us an important information about the nature of randomness in stochastic process 
via link between the scaling exponent H and the autocorrelation exponent between steps of 
random walk. Therefore, the precise knowledge of scaling range dependence on any other 
involved parameters is a substantial task and has an impact on the final outcomes of DFA 
power law quoted in Eq.(l). We did our simulations on the ensemble of 5 x 10^ short and 
medium-length time series with 5 x 10^ < L < 10^. We varied also their autocorrelation 
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properties in order to reflect properties of real random walk signals mostly existing in 
nature. 

First, it has been found that for uncorrelated process, the scaling range A of DFA 
power law is the perfect linear function of data length and the goodness of linear fit to 
power law formula in Eq.(l). Moreover, this linear relationship extends also to time series 
with long memory. The uniform shape of X{L,u) dependence for different memory levels 
in data, rises the question if one unified simple formula describing dependence of scaling 
range on all parameters in a game, i.e. A(L, u, H) exists . We found such a formula, 
and showed that it fits data obtained from numerical simulations no worse than patterns 
previously found in this article for A(L, u) dependence at separate values of H. The unified 
formula contains only four free parameters, which were calculated with high precision and 
are presented in Table 2. We showed also that scaling range grows with a long memory 
level present in time series - on the average of 3 -j- 10% for every 5H = 0.1 (see Eq.(9)). 
A rather slight increase in the scaling range for the series with memory in comparison 
with the array of uncorrelated data may entitle us to simplify the scaling range for the 
series with long memory, using a model for uncorrelated data, i.e. with H = 1/2. The 
presented results can be considered therefore as the lower limit for the DFA scaling range 
profile. 

The relations we found strike with their simplicity and make a useful recipe how to 
determine the scaling ranges, especially for short time series - wherever one needs to 
consider very large data sets arranged in shorter subseries. In particular, these results 
can be used in search for evolving (time-dependent) local Hurst exponent in large amount 
of moving time windows. The extension of this approach to other techniques of fiuctuation 
analysis (FA) can also be done [40] . 
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Figure 1: Percentage rate (%) of rejected time series with scaling range not providing 
indicated goodness of regression line fit in DFA procedure. Results are based on the 
ensemble of 5 x 10^ time series and are drawn as a function of maximal box size Tmax for 
two lengths of uncorrected data: (a) L = 10^ and (b) L = 3 x 10^. 
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Figure 2: Dependence between scaling range A as in Fig.l and: (a) time series length 
L or (b) the goodness of linear fit m = 1 — i?^ . Examples of X{L,u = fixed) and 
X{u, L = fixed) relations for various u and L values are shown at 97.5% confidence level. 
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Figure 3: Same as in Fig.2 but at 95% confidence level. 
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Figure 4: (a) Dependence between fitted coefficient A in Eq. (2) and the goodness of DFA 
ffi u = 1 — i?^ for uncorrelated data {H = 0.5) sliown for two distinct confidence levels: 
CL = 97.5%, CL = 95.0%. (b) Same for B coefficient in Eq.(2). 
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Figure 5: Best fit results of the relationship suggested in Eq.(3) at CL = 97.5% level. 
Continuous lines represent the fit of Eq.(3) to data (marked as points), (a) Results shown 
for A(L,M = fixed) nt = 1-u = 0.96, 0.97, 0.98, 0.985. (b) results for A(L = fixed, u) 
shown for chosen lengths of uncorrelated data L = 10^, 3 x 10^, 6 x 10^, 10^. Parameters 
of fits are gathered in Table 1. 
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Figure 6: Same as in Fig. 5 but at CL = 95% level. 
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Figure 7: Percentage rate (%) of rejected time series as a function of maximal box size 
Tmax- Exemplary results are based on the ensemble of 5 x 10^ time series for autocorrelated 
signal of length L = 10^ with H = 0.7. 
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Figure 8: Dependence between scaling range A and: (a) time series length L or (b) 
the goodness of linear u = 1 — B? for autocorrelated signals. The autocorrelation 
level is indicated by Hurst exponent H. Plots are shown only for L = 10^ and u = 0.02 
{R^ = 0.98) but the relations A(L, u) for the remaining values of L and u look qualitatively 
very similar (not shown). The linear relationship X{L,u) is seen at CL = 97.5% level. 



18 




0.00 



(a) 



0.00 



0.01 



0.02 



0.03 
u 



0.04 



0.05 



0.06 



(b) 



0.00 




-20 
-40 
cq -60 
-80 
-100 
-120 



0.01 



0.02 



u 

0.03 



0.04 



0.05 



0.06 



CL = 97.5% 



♦ 



♦ 



♦ 
I 



♦ 

▲ 



♦ H = 0.6 
■ H = 0.7 

aH = 0.8 



A 
i 



Figure 10: (a) Dependence between coefficient A in Eq.(2) and the goodness of DFA 
fit M = 1 — -R^ for autocorrelated data {H = 0.6, H = 0.7, H = 0.8). Confidence level 
CL = 97.5% is considered but CL = 95% looks similarly (not shown), (b) Same for B 
coefficient in Eq.(2). 
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Figure 11: Best fit results of Eq.(7) found for simulated series of autocorrelated data at 
CL = 97.5% level and shown for L < 3000. Continuous lines represent fit of Eq.(7) to 
data points marked as dots for chosen u. The cases of other u values (not shown due to 
lack of space) look identically. Parameters of the fit are gathered in Table 2. 
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Figure 12: Same as in Fig. 11 for CL = 0.95%. 
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Figure 13: Same dependence as in Fig. 11 shown for longer series of data {L < 10^). Only 
cases for u = 0.015 and u = 0.030 are presented due to lack of space. Plots for other 
coefficients look qualitatively the same. 
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Figure 14: Same as in Fig. 13 for CL = 95%. 
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Figure 15: Best fit results of Eq.(7) found for simulated series of autocorrelated data at 
CL = 97.5% as in Fig. 11 and shown as the function of u. The cases of several lengths 
of data are shown. Other L values (not shown due to lack of space) show also linear 
dependence on u. Parameters of the fits are gathered in Table 2. 
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Figure 16: Same as in Fig. 15, but for CL = 95%. 
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